263 research outputs found

    Efficient Incremental Data Analysis

    Get PDF
    Many data-intensive applications require real-time analytics over streaming data. In a growing number of domains -- sensor network monitoring, social web applications, clickstream analysis, high-frequency algorithmic trading, and fraud detections to name a few -- applications continuously monitor stream events to promptly react to certain data conditions. These applications demand responsive analytics even when faced with high volume and velocity of incoming changes, large numbers of users, and complex processing requirements. Developing suitable online analytics engine that meets these requirements is challenging. In this thesis, we study techniques for efficient online processing of complex analytical queries, ranging from standard database queries to complex machine learning and digital signal processing workflows. First, we focus on the problem of efficient incremental computation for database queries. We have developed a system, called DBToaster, that compiles declarative queries into high-performance stream processing engines that keep query results (views) fresh at very high update rates. At the heart of our system is a recursive query compilation algorithm that materializes a set of supporting higher-order delta views to achieve a substantially lower view maintenance cost. We study the trade-offs between single-tuple and batch incremental processing in local execution, and we present a novel approach for compiling view maintenance code into data-parallel programs optimized for distributed execution. DBToaster supports millions of complete view refreshes per second for a broad range of queries and outperforms commercial database and stream engines by orders of magnitude. We also study the incremental computation for queries written as iterative linear algebra, which can capture many machine learning and scientific calculations. We have developed a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change and make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our last research question concerns the integration of general-purpose query processors and domain-specific operations to enable deep data exploration in both online and offline analysis. We advocate a deep integration of signal processing operations and general-purpose query processors. We demonstrate that in-situ processing of tempo-relational and signal data through a unified query language empowers users to express end-to-end workflows more succinctly inside one system while at the same time offering orders of magnitude better performance than existing popular data management systems

    Investigation of possible causes for appearance of a crack in the welded joint of the ship winch frame

    Get PDF
    Ship winches are one of the most important parts of the ship equipment since they perform the most responsible tasks on various ships. In the majority of cases, the ship winches are welded structures. All the necessary calculations according to required standards, that have to be done prior to actual execution of the structure, should also include the verification by the finite elements method. For the high reliability requirements to be met, the welded joints integrity of all the parts must be examined before they are assembled into the winch. After all the tests are conducted and parts are assembled into the winch, the factory acceptance test (FAT) must be done. During those tests all the flaws, which can appear during manufacturing, must show. An appearance of a very unusual crack in the ship winch frame, which happened during the FAT, is described in this paper. The simulation by the finite elements method was performed to obtain the stresses at which the crack appeared. The possible causes for that crack appearance are considered. Some measures for reducing appearance of such cracks to a minimum are proposed, as well as certain directions for further research of this problem.

    Antineutrophil cytoplasmic antibody (ANCA)-associated autoimmune diseases induced by antithyroid drugs: comparison with idiopathic ANCA vasculitides

    Get PDF
    Clinical and serological profiles of idiopathic and drug-induced autoimmune diseases can be very similar. We compared data from idiopathic and antithyroid drug (ATD)-induced antineutrophil cytoplasmic antibody (ANCA)-positive patients. From 1993 to 2003, 2474 patients were tested for ANCA in the Laboratory for Allergy and Clinical Immunology in Belgrade. Out of 2474 patients, 72 (2.9%) were anti-proteinase 3 (PR3)- or anti-myeloperoxidase (MPO)-positive and their clinical and serological data were analyzed. The first group consisted of ANCA-associated idiopathic systemic vasculitis (ISV) diagnosed in 56/72 patients: 29 Wegener's granulomatosis (WG), 23 microscopic polyangiitis (MPA) and four Churg-Strauss syndrome. The second group consisted of 16/72 patients who became ANCA-positive during ATD therapy (12 receiving propylthiouracil and four receiving methimazole). We determined ANCA and antinuclear (ANA) antibodies by indirect immunofluorescence; PR3-ANCA, MPO-ANCA, anticardiolipin (aCL) and antihistone antibodies (AHA) by ELISA; and cryoglobulins by precipitation. Complement components C3 and C4, alpha-1 antitrypsin (α1 AT) and C reactive protein (CR-P) were measured by nephelometry. Renal lesions were present in 3/16 (18.8%) ATD-treated patients and in 42/56 (75%) ISV patients (p <0.001). Skin lesions occurred in 10/16 (62.5%) ATD-treated patients and 14/56 (25%) ISV patients (p <0.01). ATD-treated patients more frequently had MPO-ANCA, ANA, AHA, aCL, cryoglobulins and low C4 (p <0.01). ISV patients more frequently had low α1 AT (p = 0.059) and high CR-P (p <0.001). Of 16 ATD-treated patients, four had drug-induced ANCA vasculitis (three MPA and one WG), while 12 had lupus-like disease (LLD). Of 56 ISV patients, 13 died and eight developed terminal renal failure (TRF). There was no lethality in the ATD-treated group, but 1/16 with methimazole-induced MPA developed pulmonary-renal syndrome with progression to TRF. ANCA-positive ISV had a more severe course in comparison with ATD-induced ANCA-positive diseases. Clinically and serologically ANCA-positive ATD-treated patients can be divided into two groups: the first consisting of patients with drug-induced WG or MPA which resemble ISV and the second consisting of patients with LLD. Different serological profiles could help in the differential diagnosis and adequate therapeutic approach to ANCA-positive ATD-treated patients with symptoms of systemic disease

    F-IVM: Learning over Fast-Evolving Relational Data

    Get PDF
    F-IVM is a system for real-time analytics such as machine learning applications over training datasets defined by queries over fast-evolving relational databases. We will demonstrate F-IVM for three such applications: model selection, Chow-Liu trees, and ridge linear regression.Comment: SIGMOD DEMO 2020, 5 page

    Evaluation Trade-Offs for Acyclic Conjunctive Queries

    Get PDF
    We consider the evaluation of acyclic conjunctive queries, where the evaluation time is decomposed into preprocessing time and enumeration delay. In a seminal paper at CSL\u2707, Bagan, Durand, and Grandjean showed that acyclic queries can be evaluated with linear preprocessing time and linear enumeration delay. If the query is free-connex, the enumeration delay becomes constant. Further prior work showed that constant enumeration delay can be achieved for arbitrary acyclic conjunctive queries at the expense of a preprocessing time that is characterised by the fractional hypertree width. We introduce an approach that exposes a trade-off between preprocessing time and enumeration delay for acyclic conjunctive queries. The aforementioned prior works represent extremes in this trade-off space. Yet our approach also allows for the enumeration delay and the preprocessing time between these extremes, in particular the delay may lie between constant and linear time. Our approach decomposes the given query into subqueries and achieves for each subquery a trade-off that depends on a parameter controlling the times for preprocessing and enumeration. The complexity of the query is given by the Pareto optimal points of a bi-objective optimisation program whose inputs are possible query decompositions and parameter values

    TraNCE: Transforming Nested Collections Efficiently

    Get PDF
    Nested relational query languages have long been seen as an attractive tool for scenarios involving large hierarchical datasets. There has been a resurgence of interest in nested relational languages. One driver has been the affinity of these languages for large-scale processing platforms such as Spark and Flink. This demonstration gives a tour of TraNCE, a new system for processing nested data on top of distributed processing systems. The core innovation of the system is a compiler that processes nested relational queries in a series of transformations; these include variants of two prior techniques, shredding and unnesting, as well as a materialization transformation that customizes the way levels of the nested output are generated. The TraNCE platform builds on these techniques by adding components for users to create and visualize queries, as well as data exploration and notebook execution targets to facilitate the construction of large-scale data science applications. The demonstration will both showcase the system from the viewpoint of usability by data scientists and illustrate the data management techniques employed

    Conjunctive Queries with Free Access Patterns under Updates

    Get PDF
    We study the problem of answering conjunctive queries with free access patterns under updates. A free access pattern is a partition of the free variables of the query into input and output. The query returns tuples over the output variables given a tuple of values over the input variables. We introduce a fully dynamic evaluation approach for such queries. We also give a syntactic characterisation of those queries that admit constant time per single-tuple update and whose output tuples can be enumerated with constant delay given an input tuple. Finally, we chart the complexity trade-off between the preprocessing time, update time and enumeration delay for such queries. For a class of queries, our approach achieves optimal, albeit non-constant, update time and delay. Their optimality is predicated on the Online Matrix-Vector Multiplication conjecture. Our results recover prior work on the dynamic evaluation of conjunctive queries without access patterns
    • …
    corecore